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Abstract We describe an implementatìon of the Farfel FORTRAN AD extensions 
QQ ■ (Radul et al., 2012). These extensions integrate forward and reverse AD directly into 

the programming model, with attendant benefits to flexibility, modularity, and ease 
of use. The implementatìon we describe is a "prepreprocessor" that generates input 
to existing FORTRAN-based AD tools. In essence, blocks of code which are targeted 
for AD by Farfel constructs are put into subprograms which capture their lexical 
Y^ \ variable context, and these are closure-converted into top-level subprograms and 

specialized to eliminate external arguments, rendering them amenable to existing 
AD preprocessors, which are then invoked, possibly repeatedly if the AD is nested. 

(N ■ 
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Q ; 1 Introductìon 

The Forward And Reverse Fortran Extension Language (Farfel) extensions to 

Fortran enable smooth and modular use of AD (Radul et al., 2012). A variety 

l> ■ of implementatìon strategies present themselves, ranging from (a) deep integration 

'V^ I into a Fortran compiler, to (b) a preprocessor that performs the requested AD 

Cd ■ and generates FORTRAN or some other high-level language, to (e) a prepreprocessor 

which itself does no AD but generates input to an existing FORTRAN AD preproces- 
sor This last strategy leverages existing FORTRAN-based AD tools and compilers. 
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avoiding re-implementation of the AD transfomiations, at the expense of inheriting 
some of the Hmitations of the AD tool it invokes. This is the technique we will focus 
upon. 

Farfallen transforms Farfel input into Fortran, and invokes an existing 
AD system (Bischof et al., 1992; Hascoét and Pascual, 2004) to generate the needed 
derivatives. The process can make use of a variety of existing FORTRAN-based AD 
preprocessors, making it easy for the programmerto switch between them. There is 
significant semantic mismatch between Farfel and the AD operations allowed by 
the AD systems used, necessitating rather dramatic code transformations. When the 
Farfel program involves nested AD, the transformations and staging become even 
more involved. Viewed as a whole, this tool automates the task of applying AD, 
including the detailed maneuvers required for nested application of existing tools, 
thereby extending the reach and utility of AD. 

The remainder of the paper is organized as follows: Section 2 reviews the FAR- 
FEL extensions. A complete example program on page 4 illustrates their use. Sec- 
tion 3 describes the implementation in detail using this example program. Section 4 
summarizes work's contributions. 



2 Language Extensions 

Farfel provides two principal extensions to Fortran: syntax for AD and for 
nested subprograms. 

Extension 1: AD Syntax 

Farfel adds the adf construct for forward AD: 



ADF (TANGENTI vor) = expr ...) 

statements 

END ADF (ygr = TANGENT(var) ...) 



Multiple opening and closing assignments are separated by commas. Independent 
variables are listed in the "calls" to tangent on the left-hand sides of the opening 
assignments and are given the specified tangent values. Dependent variables appear 
in the "calls" to tangent on the right-hand sides of the closing assignments and 
the corresponding tangent values are assigned to the indicated destination variables. 
The adf construct uses forward AD to compute the directional derivative of the 
dependent variables at the point specified by the vector of independent variables in 
the direction specified by the vector of tangent values for the independent variables 
and assigns it to the destination variables. 

An analogous FARFEL construct supports reverse AD: 



ADR(COTANGENT(var) = expr_ ...) 

statements 

END ADR (var = COTANGENT (var) ...) 



Dependent variables are listed in the "calls" to cotangent on left-hand sides of 
the opening assignments and are given the specified cotangent values as inputs to 
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the reverse phase. Independent variables appear in the "calls" to cotangent on the 
right-hand sides of the closing assignments and the corresponding cotangent values 
at the end of the reverse phase are assigned to the indicated destination variables. 
The ADR construct uses reverse AD to compute the gradient with respect to the in- 
dependent variables at the point specified by the vector of independent variables in- 
duced by the specified gradient with respect to the dependent variables, and assigns 
it to the destination variables. The expressions used to initialize the cotangent inputs 
to the re verse phase are evaluated at the end of the forward phase, even though they 
appear textually prior to the statements specifying the forward phase. This way, the 
direction input to the reverse phase can depend on the result of the forward phase. 

For both adf and adr, implied-DO syntax is used to allow arrays in the opening 
and closing assignments. By special dispensation, the statement adf (var) is inter- 
preted as ADF(TANGENT(vflr)=i) and adr (var) as adr (cotangent (var) =i) . 

Extension 2: Nested Subprograms 

In order to conveniently support distinctions between different variables of differ- 
entiation for distinct invocations of AD, as in the example below, we borrow from 
Algol 60 (Backus et al., 1963) and generalize the Fortran "statement function" 
construct by allowing subprograms to be defined inside other subprograms, with 
lexical scope. As in Algol 60, the scope of parameters and declared variables is 
the locai subprogram, and these may shadow identifiers from the surrounding scope. 
Implicitly declared variables have the top-level subprogram as their scope. 

Concrete Example 

In order to describe the implementation of the above constructs, we employ a con- 
crete example. The task is to find an equilibrium {a* ,b*) of a two-player game with 
continuous scalar strategies a and b and given payoff functions A and B. The method 
is to find roots of 

a* = argmaxA(fl,argmaxB(fl*,è)) (1) 

a b 

The full program is given, for reference, in Listing 1. The he art of the program is 
the implementation eqlbrm of (1). Note that this whole program is only 63 lines of 
code, with plenty of modularity boundaries. This code is used as a running example 
for the remainder of the paper 



3 Implementation 

Farfel is implemented by the Farfallen preprocessor The current version is 
merely a proof of concept, and not production quality: it does not accept the en- 
tire Fortran77 language, and does not scale. However, its principles of opera- 
tion will be unchanged in a forthcoming production-quality implementation. Here 
we describe the reduction of Farfel constructs to FORTRAN, relying on existing 
FORTRAN-based AD tools for the actual derivative transformations. 
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Listing 1 Complete example Farfel program: equilibria of a continuous-strategy game. 

C ASTAR & BSTAR: GUESSES IN, OPTIMIZED VALUES OUT 
SUBROUTINE EQLBRM (BIGA, HIGH, ASTAR, BSTAR, N) 
EXTERNAL BIGA, BIGB 
FUNCTION F (ASTAR) 
FUNCTION G(A) 
FUNCTION H (B) 
H = BIGB (ASTAR, B) 
END 
BSTAR = ARGMAX(H, BSTAR, N) 
G = BIGA (A, BSTAR) 
END 
F = ARGMAX(G, ASTAR, N) -ASTAR 
END 
ASTAR = ROOT(F, ASTAR, N) 
END 

FUNCTION ROOT (F, XO, N) 

X = XO 

DO 1669 1 = 1, N 

CALL DERIV2 (F, X, Y, YPRIME) 
1669 X = X-Y/YPRIME 
ROOT = X 
END 

SUBROUTINE DERIV2 (F, X, Y, YPRIME) 

EXTERNAL F 

ADF(X) 

Y = F (X) 

END ADF(YPRIME = TANGENT(Y)) 
END 

FUNCTION ARGI^lAX (F, XO, N) 

FUNCTION FPRIME (X) 

FPRIME = DERIVI (F, X) 

END 
ARGMAX = ROOT (FPRIME, XO, N) 
END 

FUNCTION DERIVI (F, X) 

EXTERNAL F 

ADF(X) 

Y = F (X) 

END ADF(DERIV1 = TANGENT(Y)) 
END 

FUNCTION GMBIGA (A, B) 
PRICE = 20-0.1*A-0.1*B 
COSTS = A* (10-0. 05*A) 
GMBIGA = A*PRICE-COSTS 
END 

FUNCTION GMBI GB (A, B) 
PRICE = 20-0.1*B-0.0999*A 
COSTS = B* (10. 005-0. 05*B) 
GMBIGB = B*PRICE-COSTS 
END 

PROGRAM MAIN 

READ *, ASTAR 
READ *, BSTAR 
READ *, N 

CALL EQLBRM (GMBIGA, GMBIGB, ASTAR, BSTAR, N) 

PRINT *, ASTAR, BSTAR 

END 
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Farfel introduces two new constructs into FORTRAN: nested subprograms and 
syntax for requesting AD. We implement nested subprograms by extracting them 
to the top level, and communicating the free variables from the enclosing subpro- 
gram by passing them as arguments into the new top-level subprogram. This is an 
instance of closure conversion, a standard class of techniques for converting nested 
subprograms to top-level ones (Johnsson, 1985). In order to accommodate pass- 
ing formerly-free variables as arguments, we must adjust ali the cali sites of the 
formerly-nested subprogram; we must specialize ali the subprograms that accept 
that subprogram as an extemal to also accept the extra closure parameters; and ad- 
just ali cali sites to ali those specialized subprograms to pass those extra parameters. 

We implement the AD syntax by constructing new subroutines that correspond to 
the statements inside each adf or adr block, arranging to cali the AD tool of choice 
on each of those new subroutines, and transforming the block itself into a cali to the 
appropriate tool-generated subroutine. 

Nested Subprograms in Detail 

Let US illustrate closure conversion on our example. Recali argmax in our example 
program: 



FUNCTION ARGMAX (F, XO 


, N) 




FUNCTION FPRIME (X) 






FPRIME 


= DERIVI (F, 


X) 




END 








ARGMAX = 


ROOT (FPRIME, 


XO, 


N) 


END 









This contains the nested function fprime. We closure convert this as follows. First, 
extract fprime to the top level: 



FUNCTION ARGMAX_FPRIME (X, F) 








ARGMAX_FPRIME = DERIVI (F, X) 








END 








FUNCTION ARGMAX (E, XO, N) 








ARGMAX = R00T_1 (ARGMAX_FPRIME, 


F, 


XO, 


N) 


END 









Note the addition of a closure argument for f since it is freely referenced in fprime, 
and the addition of the same closure argument at the cali site, since fprime is passed 
as an external to root. Then we specialize root to create a version that accepts the 
needed set of closure arguments (in this case one): 



FUNCTION R00T_1 (E, FI, XO , 


N) 


X = XO 




DO 1669 1=1, N 




CALL DERIV2 1(F, FI, X, Y, 


YPRIME) 


1669 X = X-Y/YPRIME 




ROOT 1 = X 




END 





Since ROOT contained a cali to deriv2, passing it the extemal passed to root, we 
must also specialize deriv2: 

SUBROUTINE DERIV2_1 (F, FI, X, Y, YPRIME) 
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EXTERNAL F 
ADF(X) 

Y = F (X, FI) 

END ADF(YPRIME = TANGENT(Y)) 

END 



We must, in general, copy and specialize the portion of the cali graph where the 
nested subprogram travels, which in this case is just two subprograms. During such 
copying and specialization, we propagate extemal constants (e.g., fprime through 
the cali to root_i, the cali to deriv2_i, and the cali site therein) allowing the 
elimination of the external declaration for these values. This supports AD tools 
that do not allow taking derivatives through calls to external subprograms. 

That is the process for handling one nested subprogram. In our example, the 
same is done for f in eqlbrm, g in f, and h in g. Doing so causes the introduction 
of a number of closure arguments, and the specialization of a number of subpro- 
grams to accept those arguments; including perhaps further specializing things that 
have already been specialized. The copying also allows a limited forni of subpro- 
gram reentrancy: even if recursion is disallowed (as in traditional FORTRAN77) our 
nested uses of argmax will cause no difficulties because they will end up calling two 
different specializations of argmax. 

Note that we must take care to prevent this process from introducing spurious 
aUases. For example, in eqlbrm, the internai function f that is passed to root closes 
over the iteration count n, which is also passed to root separately. When specializ- 
ing ROOT to accept the closure parameters of f, we must not pass n to the specializa- 
tion of ROOT twice, lest we run afoul of FORTRAN's prohibition against assigning to 
aliased values. Fortunately, such situations are syntactically apparent. 

Finally, specialization leaves behind unspecialized (or underspecialized) versions 
of subprograms, which may now be unused, and if so can be eliminated. In this 
case, that includes root, deriv2, argmax, fprime, and derivi from the originai 
program, as well as some intermediate specializations thereof. 

AD Syntax in Detail 

We implement the AD syntax by first canonicalizing each adf or adr block to be 
a single cali to a (new, internai) subroutine, then extracting those subroutines to the 
top level, then rewriting the block to be a cali to an AD-transformed version of the 
subroutine, and then arranging to cali the AD tool of choice on each of those new 
subroutines to generate the needed derivatives. 

Retuming to our example program, closure conversion of nested subprograms 
produced the following specialization of derivi: 



function DERIV1_1 (ASTAR, BIGA, 


BIGB, 


BSTAR, 


N, X) 


ADF(X) 








Y = EQLBRM_F_G (X, ASTAR, BIGA, 


BIGB, 


BSTAR, 


N) 


END ADF(DERIV1_1 = TANGENT(Y)) 








END 









which contains an adf block. We seek to convert this into a form suitable for invok- 
ing the AD preprocessor We first canonicalize by introducing a new subroutine to 
capture the statements in the adf block, producing the following: 
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FUNCTION DERIVI 


1 (ASTAR, 


BIGA, 


BIGB, 


BSTAR, N 


X) 


SUBROUTINE ADFl () 












Y = EQLBRM F 


G (X, 


ASTAR 


, BIGA 


, BIGB 


, BSTAR, 


N) 


END 














ADF(X) 














CALL ADFl 














END ADF(DERIV1_ 


_1 = 


TANGENT(Y) ) 








END 















Extracting the subroutine adfi to the top level as before yields the following: 



SUBROUTINE DERIV1_1_ADF1 (X, ASTAR, BIGA, BIGB, BSTAR, N, Y) 

Y = G(X, ASTAR, BIGA, BIGB, BSTAR, N) 

END 

FUNCTION DERIV1_1 (ASTAR, BIGA, BIGB, BSTAR, N, X) 

ADF(X) 

CALL DERIV1_1_ADF1 (X, ASTAR, BIGA, BIGB, BSTAR, N, Y) 

END ADF(DERIV1_1 = TANGENT(Y)) 

END 



Now we are properly set up to rewrite the adf block into a subroutine cali — 
specifically, to a subroutine that will be generateci from derivi_i_adfi by AD. 
The exact result depends on the AD tool that will be used to construct the derivative 
of DERivi_i_ADFi ; for Tapenade, the generated code looks like this: 



FUNCTION DERIV1_1 (ASTAR, BIGA, 


BIGB, 


BSTAR, N, 


X) 




X GÌ = 1 










ASTAR GÌ = 










BSTAR GÌ = 










CALL DERIV1_1_ADF1_G1 (X, X_G1, 


ASTAR 


ASTAR_G1 


BIGA, 


BIGB, 


+BSTAR, BSTAR GÌ, N, Y, DERIVI 


_1) 








END 











Different naming conventions are used for derivi_i_adfi_gi when generating 
code for Adifor; the parameter passing conventions of TAPENADE and Adifor 
agree in this case. Farfallen maintains the types of variables in order to know 
whether to generate variables to hold tangents and cotangents (which are initialized 
to zero if they were not declared in the relevant opening assignments.) 

The same must be repeated for each adf and adr block; in our example there are 
five in ali: two in specializations of derivi and three in specializations of deriv2. 
We must also specialize eqlbrm and its descendants in the cali graph, by the process 
akeady illustrated, to remove external calls to the objective functions, for the reasons 
described earlier 

Finally, we must invoke the user's preferred AD tool to generate ali the needed 
derivatives. Here, FARFALLEN might invoke TAPENADE as follows: 



#.' /bin/sh 

tapenade -root derivl_2_adf 2 -d -o eqlbrm42 -diffvarname "_g2"\ 

-dif f funcname "_g2 " eqlbrm42.f 
tapenade -root deriv2_l_2_adf 4 -d -o eqlbrm42 -diffvarname "_g4"\ 

-dif f funcname "_g4" eqlbrm42 { ,_g2 } . f 
tapenade -root derivl_l_adf 1 -d -o eqlbrm42 -diffvarname "_gl"\ 

-dif f funcname "_gl" eqlbrm42 { ,_g2 ,_g4 ) . f 
tapenade -root deriv2_l_l_adf 3 -d -o eqlbrm42 -diffvarname "_g3"\ 

-dif f funcname "_g3" eqlbrm42 { ,_g2 ,_g4,_gl } . f 
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tapenade -root deriv2_2_adf 5 -d -o eqlbrm42 -diffvarname "_g5"\ 
-dif f funcname "_g5" eqlbrm42 { ,_g2 ,_g4,_gl , _g3 } . f 



We must take care that multiple invocations of the AD tool to generate the various 
derivatives occur in the proper order, which is computatale from the cali graph of 
the program, to ensure that generated derivative codes that are used in subprograms 
to be differentiated are available to be transformed. For ex ampie, ali the derivatives 
of EQLBRM_F_G_H needed for its optimizations must be generated before generating 
derivatives of (generated subprograms that cali) eqlbrm_f_g. We must also take 
care to invoke the AD tool with different prefixes/suffixes, so that variables and 
subprograms created by one differentiation do not clash with those made by another. 

Performance 

We tested the performance of code generated by Farfallen in conjunction with 
Tapenade and with Adifor. We executed Farfallen once to generate FOR- 
TRAN77 source using TAPENADE for the automatic differentiation, and separately 
using Adifor for the AD. In each case, we compiled the resulting program 
(gfo rtran 4.6.2-9, 64-bitDebiansid, -Of ast - fwho le -program, single pre- 
cision) with A^ = 1000 iterations at each level and timed the execution of the binary 
on a 2.93GHz Intel i7 870. For comparison, we translated the same computation 
into VLAD (Pearlmutter and Siskind, 2008) (Fig. 1), compiled it with Stalin- 
GRAD (Siskind and Pearlmutter, 2008), and ran it on the same machine. Stal- 
INGRAD has the perhaps unfair advantage of being an optimizing compiler with 
integrated support for AD, so we are pleased that FARFALLEN was able to achieve 
performance that was nearly competitive. 



Tapenade Adifor Stalingrad 
6.97 8.92 5.83 

The VLAD code in Fig. 1 was written with the same organization, variable names, 
subprogram names, and parameter names and order as the corre sponding Farfel 
code in Listing 1 to help a reader unfamiliar with SCHEME, the language on which 
VLAD is based, understand how Farfel and VLAD both represent the same essen- 
tial notions of nested subprograms and the AD discipline of requesting derivatives 
precisely where they are needed. Because functional-programming languages, like 
SCHEME and VLAD, prefer higher-order functions (i.e., operators) over the block- 
based style that is prevalent in FORTRAN, AD is invoked via the J and J operators 
rather than via the adf and adr statements. However, there is a one-to-one corre- 
spondence between 



t:/: 



an operator that takes a function / as input, along with a primal argument x and 
tangent argument i and returns both a primal result y and a tangent result y, and the 
following: 



ADF (TANGENT (X) = x) 

y = fM 

END ADF(y = TANGENT (>>)) 
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EQLBRM(A,B,ao,i>o,«) = let/(fl') =ìetg{a) =let b{b) =B(a' ,b) 

ÌnA(a,ARGMAX(/7,Èo,")) 
in ARGMAX(g, a' ,n)-a' 

a* = ROOT(/,ao,n) 
inlet/!(è) =B{a*,b) 

b* = ARGMAX(/!,Z)o,n) 

ina* ,b* 

ROOl{f,x,n) = if n = 
thenx 

else let >■,>■' = deriv2(/,x) 
inROOT(/,x-^,n- 1) 

DERIV2(/,x)= J^(/,X,1) 
ARGMAX(/,Xo,n) =let/'(x) =DERIVl(/,x) 

inROOT(/',xo,n) 

DERIVl(/,x) = letx,>'= J(/,x, 1) 
iny 

A{a,b) =\etprìce = 20-0.1 xa-Q.ìxb 
costs = ax { 10 — 0.05 x a) 
in a X price — costs 

B{a,b) = let price = 20-0.1 x Z7-0.0999 x a 
costs = bx (10.005 - 0.05 x b) 
in ò X price — costs 
letoo = READReal() 
bo = readRealO 
n = readReal() 

a*,b* = EQLBRM(A,B,ao,èo:«) 
in WRITEREAL(a*) 
WRITEREAL(è*) 



Fig. 1 Complete VLAD program for our concrete example with the same organization and func- 
tionality as the Farfel program in Listing 1 . 



Similarly, there is a one-to-one correspondence between 

J : f xxxy-^yxx 

an operator that takes a function / as input, along with a primal argument x and 
cotangent y and retums both a primal result y and a cotangent x, and 



ADR(COTANGENT(v) = V) 

.y = /U) 

END ADR(X = COTANGENT (x) ) 



The strong analogy between how the callee-derives AD discipline is represented in 
both Farfel and vlad serves two purposes: it enables the use of the Stalingrad 
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compiler technology for compiling Farfel and facilitates migration from legacy 
imperative languages to more modem, and more easily optimized, pure languages. 



4 Conclusìon 

We bave illustrated an implementation of the Farfel extensions to FORTRAN — 
nested subprograms and syntax for AD (Radul et al., 2012). These extensions en- 
able convenient, modular programming using a callee-derives paradigm of auto- 
matic differentiation. Our implementation is a preprocessor that translates FARFEL 
Fortran extensions into input suitable for an existing AD tool. This strategy en- 
ables modular, flexible use of AD in the context of an existing legacy language 
and tool chain, without sacrificing the desirable performance characteristics of these 
tools: in the concrete example, only 20%-50% slower than a dedicated AD-enabled 
compiler, depending on which FORTRAN AD system is used. 
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